906 research outputs found

    Applying Rule Ensembles to the Search for Super-Symmetry at the Large Hadron Collider

    Get PDF
    In this note we give an example application of a recently presented predictive learning method called Rule Ensembles. The application we present is the search for super-symmetric particles at the Large Hadron Collider. In particular, we consider the problem of separating the background coming from top quark production from the signal of super-symmetric particles. The method is based on an expansion of base learners, each learner being a rule, i.e. a combination of cuts in the variable space describing signal and background. These rules are generated from an ensemble of decision trees. One of the results of the method is a set of rules (cuts) ordered according to their importance, which gives useful tools for diagnosis of the model. We also compare the method to a number of other multivariate methods, in particular Artificial Neural Networks, the likelihood method and the recently presented boosted decision tree method. We find better performance of Rule Ensembles in all cases. For example for a given significance the amount of data needed to claim SUSY discovery could be reduced by 15 % using Rule Ensembles as compared to using a likelihood method.Comment: 24 pages, 7 figures, replaced to match version accepted for publication in JHE

    Bagging ensemble selection for regression

    Get PDF
    Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on binary classification problems have shown that using random trees as base classifiers, BES-OOB (the most successful variant of BES) is competitive with (and in many cases, superior to) other ensemble learning strategies, for instance, the original ES algorithm, stacking with linear regression, random forests or boosting. Motivated by the promising results in classification, this paper examines the predictive performance of the BES-OOB strategy for regression problems. Our results show that the BES-OOB strategy outperforms Stochastic Gradient Boosting and Bagging when using regression trees as the base learners. Our results also suggest that the advantage of using a diverse model library becomes clear when the model library size is relatively large. We also present encouraging results indicating that the non negative least squares algorithm is a viable approach for pruning an ensemble of ensembles

    Kernel density classification and boosting: an L2 sub analysis

    Get PDF
    Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research

    Immediate reward reinforcement learning for clustering and topology preserving mappings

    Get PDF
    We extend a reinforcement learning algorithm which has previously been shown to cluster data. Our extension involves creating an underlying latent space with some pre-defined structure which enables us to create a topology preserving mapping. We investigate different forms of the reward function, all of which are created with the intent of merging local and global information, thus avoiding one of the major difficulties with e.g. K-means which is its convergence to local optima depending on the initial values of its parameters. We also show that the method is quite general and can be used with the recently developed method of stochastic weight reinforcement learning [14]

    Cost-sensitive Bayesian network learning using sampling

    Get PDF
    A significant advance in recent years has been the development of cost-sensitive decision tree learners, recognising that real world classification problems need to take account of costs of misclassification and not just focus on accuracy. The literature contains well over 50 cost-sensitive decision tree induction algorithms, each with varying performance profiles. Obtaining good Bayesian networks can be challenging and hence several algorithms have been proposed for learning their structure and parameters from data. However, most of these algorithms focus on learning Bayesian networks that aim to maximise the accuracy of classifications. Hence an obvious question that arises is whether it is possible to develop cost-sensitive Bayesian networks and whether they would perform better than cost-sensitive decision trees for minimising classification cost? This paper explores this question by developing a new Bayesian network learning algorithm based on changing the data distribution to reflect the costs of misclassification. The proposed method is explored by conducting experiments on over 20 data sets. The results show that this approach produces good results in comparison to more complex cost-sensitive decision tree algorithms

    Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

    Full text link
    Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grouping structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods

    Building Detection from Mobile Imagery Using Informative SIFT Descriptors

    Full text link
    Abstract. We propose reliable outdoor object detection on mobile phone imagery from o-the-shelf devices. With the goal to provide both robust object detection and reduction of computational complexity for situated interpretation of urban imagery, we propose to apply the 'Informative Descriptor Approach ' on SIFT features (i-SIFT descriptors). We learn an attentive matching of i-SIFT keypoints, resulting in a signicant im-provement of state-of-the-art SIFT descriptor based keypoint matching. In the o-line learning stage, rstly, standard SIFT responses are eval-uated using an information theoretic quality criterion with respect to object semantics, rejecting features with insucient conditional entropy measure, producing both sparse and discriminative object representa-tions. Secondly, we learn a decision tree from the training data set that maps SIFT descriptors to entropy values. The key advantages of in-formative SIFT (i-SIFT) to standard SIFT encoding are argued from observations on performance complexity, and demonstrated in a typical outdoor mobile vision experiment on the MPG-20 reference database.

    Classifiers Based on Two-Layered Learning

    Full text link
    Abstract. In this paper we present an exemplary classifier (classifica-tion algorithm) based on two-layered learning. In the first layer of learn-ing a collection of classifiers is induced from a part of original training data set. In the second layer classifiers are induced using patterns ex-tracted from already constructed classifiers on the basis of their perfor-mance on the remaining part of training data. We report results of exper-iments performed on the following data sets, well known from literature: diabetes, heart disease, australian credit (see [5]) and lymphography (see [4]). We compare the standard rough set method used to induce classi-fiers (see [1] for more details), based on minimal consistent decision rules (see [6]), with the classifier based on two-layered learning.

    A Study of Machine Learning Techniques for Daily Solar Energy Forecasting using Numerical Weather Models

    Get PDF
    Proceedings of: 8th International Symposium on Intelligent Distributed Computing (IDC'2014). Madrid, September 3-5, 2014Forecasting solar energy is becoming an important issue in the context of renewable energy sources and Machine Learning Algorithms play an important rule in this field. The prediction of solar energy can be addressed as a time series prediction problem using historical data. Also, solar energy forecasting can be derived from numerical weather prediction models (NWP). Our interest is focused on the latter approach.We focus on the problem of predicting solar energy from NWP computed from GEFS, the Global Ensemble Forecast System, which predicts meteorological variables for points in a grid. In this context, it can be useful to know how prediction accuracy improves depending on the number of grid nodes used as input for the machine learning techniques. However, using the variables from a large number of grid nodes can result in many attributes which might degrade the generalization performance of the learning algorithms. In this paper both issues are studied using data supplied by Kaggle for the State of Oklahoma comparing Support Vector Machines and Gradient Boosted Regression. Also, three different feature selection methods have been tested: Linear Correlation, the ReliefF algorithm and, a new method based on local information analysis.Publicad

    Gastric cancer and Helicobacter pylori: a combined analysis of 12 case control studies nested within prospective cohorts

    Get PDF
    BACKGROUND: The magnitude of the association between Helicobacter pylori and incidence of gastric cancer is unclear. H pylori infection and the circulating antibody response can be lost with development of cancer; thus retrospective studies are subject to bias resulting from classifi- cation of cases as H pylori negative when they were infected in the past. AIMS: To combine data from all case control studies nested within prospective cohorts to assess more reliably the relative risk of gastric cancer associated with H pylori infection.To investigate variation in relative risk by age, sex, cancer type and subsite, and interval between blood sampling and cancer diagnosis. METHODS: Studies were eligible if blood samples for H pylori serology were collected before diagnosis of gastric cancer in cases. Identified published studies and two unpublished studies were included. Individual subject data were obtained for each. Matched odds ratios (ORs) and 95% confidence intervals (95% CI) were calculated for the association between H pylori and gastric cancer. RESULTS: Twelve studies with 1228 gastric cancer cases were considered. The association with H pylori was restricted to noncardia cancers (OR 3.0; 95% CI 2.3–3.8) and was stronger when blood samples for H pylori serology were collected 10+ years before cancer diagnosis (5.9; 3.4–10.3). H pylori infection was not associated with an altered overall risk of cardia cancer (1.0; 0.7–1.4). CONCLUSIONS: These results suggest that 5.9 is the best estimate of the relative risk of non-cardia cancer associated with H pylori infection and that H pylori does not increase the risk of cardia cancer. They also support the idea that when H pylori status is assessed close to cancer diagnosis, the magnitude of the non-cardia association may be underestimated
    corecore